This report investigates the spatio-temporal patterns of chickenpox cases in Hungary over a period of 10 years. The analysis is based on weekly data for each of Hungary’s 20 counties, covering the period from 2005 to 2015. Chickenpox, a contagious viral infection, can lead to outbreaks in both urban and rural regions. Understanding the epidemiology of chickenpox through temporal and spatial analyses is vital for developing effective public health strategies, including vaccination campaigns.
Objectives:
The findings of this study can help inform targeted interventions to mitigate chickenpox outbreaks in Hungary.
Before we begin the analysis, we need to load and inspect the datasets. The chickenpox case data is collected weekly for each of Hungary’s 20 counties. The adjacency matrix provides the spatial relationships between counties, which is essential for spatial autocorrelation analysis.
We will load the chickenpox data and the county adjacency matrix, inspect the structure of both datasets, and perform any necessary data cleaning or transformation. # Data Preparation
Before we begin the analysis, we need to load and inspect the datasets. The chickenpox case data is collected weekly for each of Hungary’s 20 counties. The adjacency matrix provides the spatial relationships between counties, which is essential for spatial autocorrelation analysis.
We will load the chickenpox data and the county adjacency matrix, inspect the structure of both datasets, and perform any necessary data cleaning or transformation.
# Load datasets
chickenpox_data <- read.csv("hungary_chickenpox.csv")
county_edges <- read.csv("hungary_county_edges.csv")
# Inspect datasets
head(chickenpox_data)
## Date BUDAPEST BARANYA BACS BEKES BORSOD CSONGRAD FEJER GYOR HAJDU HEVES
## 1 03/01/2005 168 79 30 173 169 42 136 120 162 36
## 2 10/01/2005 157 60 30 92 200 53 51 70 84 28
## 3 17/01/2005 96 44 31 86 93 30 93 84 191 51
## 4 24/01/2005 163 49 43 126 46 39 52 114 107 42
## 5 31/01/2005 122 78 53 87 103 34 95 131 172 40
## 6 07/02/2005 174 76 77 152 189 26 74 181 157 44
## JASZ KOMAROM NOGRAD PEST SOMOGY SZABOLCS TOLNA VAS VESZPREM ZALA
## 1 130 57 2 178 66 64 11 29 87 68
## 2 80 50 29 141 48 29 58 53 68 26
## 3 64 46 4 157 33 33 24 18 62 44
## 4 63 54 14 107 66 50 25 21 43 31
## 5 61 49 11 124 63 56 7 47 85 60
## 6 95 97 26 146 59 54 27 54 48 60
head(county_edges)
## name_1 name_2 id_1 id_2
## 1 BACS JASZ 0 10
## 2 BACS BACS 0 0
## 3 BACS BARANYA 0 1
## 4 BACS CSONGRAD 0 5
## 5 BACS PEST 0 13
## 6 BACS FEJER 0 6
# Check structure of the datasets
str(chickenpox_data)
## 'data.frame': 522 obs. of 21 variables:
## $ Date : chr "03/01/2005" "10/01/2005" "17/01/2005" "24/01/2005" ...
## $ BUDAPEST: int 168 157 96 163 122 174 153 115 119 114 ...
## $ BARANYA : int 79 60 44 49 78 76 103 74 86 81 ...
## $ BACS : int 30 30 31 43 53 77 54 64 57 129 ...
## $ BEKES : int 173 92 86 126 87 152 192 174 171 217 ...
## $ BORSOD : int 169 200 93 46 103 189 148 140 90 167 ...
## $ CSONGRAD: int 42 53 30 39 34 26 65 56 65 64 ...
## $ FEJER : int 136 51 93 52 95 74 100 111 118 93 ...
## $ GYOR : int 120 70 84 114 131 181 118 175 105 154 ...
## $ HAJDU : int 162 84 191 107 172 157 129 138 194 119 ...
## $ HEVES : int 36 28 51 42 40 44 40 60 60 34 ...
## $ JASZ : int 130 80 64 63 61 95 88 112 67 118 ...
## $ KOMAROM : int 57 50 46 54 49 97 56 70 46 73 ...
## $ NOGRAD : int 2 29 4 14 11 26 10 21 12 6 ...
## $ PEST : int 178 141 157 107 124 146 119 178 112 130 ...
## $ SOMOGY : int 66 48 33 66 63 59 104 70 116 68 ...
## $ SZABOLCS: int 64 29 33 50 56 54 85 75 76 59 ...
## $ TOLNA : int 11 58 24 25 7 27 20 5 22 31 ...
## $ VAS : int 29 53 18 21 47 54 32 66 45 85 ...
## $ VESZPREM: int 87 68 62 43 85 48 153 149 102 96 ...
## $ ZALA : int 68 26 44 31 60 60 70 54 42 54 ...
str(county_edges)
## 'data.frame': 102 obs. of 4 variables:
## $ name_1: chr "BACS" "BACS" "BACS" "BACS" ...
## $ name_2: chr "JASZ" "BACS" "BARANYA" "CSONGRAD" ...
## $ id_1 : int 0 0 0 0 0 0 0 1 1 1 ...
## $ id_2 : int 10 0 1 5 13 6 16 1 16 14 ...
summary(chickenpox_data)
## Date BUDAPEST BARANYA BACS
## Length:522 Min. : 0.00 Min. : 0.0 Min. : 0.00
## Class :character 1st Qu.: 34.25 1st Qu.: 8.0 1st Qu.: 8.00
## Mode :character Median : 93.00 Median : 25.0 Median : 29.50
## Mean :101.25 Mean : 34.2 Mean : 37.17
## 3rd Qu.:149.00 3rd Qu.: 51.0 3rd Qu.: 53.00
## Max. :479.00 Max. :194.0 Max. :274.00
## BEKES BORSOD CSONGRAD FEJER
## Min. : 0.00 Min. : 0.00 Min. : 0.00 Min. : 0.00
## 1st Qu.: 4.00 1st Qu.: 14.25 1st Qu.: 6.00 1st Qu.: 7.00
## Median : 14.00 Median : 46.50 Median : 20.50 Median : 24.00
## Mean : 28.91 Mean : 57.08 Mean : 31.49 Mean : 33.27
## 3rd Qu.: 38.75 3rd Qu.: 83.75 3rd Qu.: 47.00 3rd Qu.: 51.75
## Max. :271.00 Max. :355.00 Max. :199.00 Max. :164.00
## GYOR HAJDU HEVES JASZ
## Min. : 0.00 Min. : 0.0 Min. : 0.00 Min. : 0.00
## 1st Qu.: 9.00 1st Qu.: 11.0 1st Qu.: 6.25 1st Qu.: 10.00
## Median : 35.00 Median : 37.0 Median : 21.00 Median : 31.00
## Mean : 41.44 Mean : 47.1 Mean : 29.69 Mean : 40.87
## 3rd Qu.: 63.00 3rd Qu.: 68.0 3rd Qu.: 41.00 3rd Qu.: 61.75
## Max. :181.00 Max. :262.0 Max. :210.00 Max. :224.00
## KOMAROM NOGRAD PEST SOMOGY
## Min. : 0.00 Min. : 0.00 Min. : 0.00 Min. : 0.00
## 1st Qu.: 6.00 1st Qu.: 4.00 1st Qu.: 28.25 1st Qu.: 6.00
## Median : 19.00 Median : 15.00 Median : 81.00 Median : 20.50
## Mean : 25.64 Mean : 21.85 Mean : 86.10 Mean : 27.61
## 3rd Qu.: 39.00 3rd Qu.: 32.75 3rd Qu.:129.75 3rd Qu.: 41.00
## Max. :160.00 Max. :112.00 Max. :431.00 Max. :155.00
## SZABOLCS TOLNA VAS VESZPREM
## Min. : 0.00 Min. : 0.00 Min. : 0.00 Min. : 0.00
## 1st Qu.: 6.00 1st Qu.: 4.00 1st Qu.: 3.00 1st Qu.: 7.25
## Median : 18.50 Median : 12.00 Median : 13.00 Median : 32.00
## Mean : 29.85 Mean : 20.35 Mean : 22.47 Mean : 40.64
## 3rd Qu.: 45.00 3rd Qu.: 29.00 3rd Qu.: 34.00 3rd Qu.: 59.00
## Max. :203.00 Max. :131.00 Max. :141.00 Max. :230.00
## ZALA
## Min. : 0.00
## 1st Qu.: 4.00
## Median : 13.00
## Mean : 19.87
## 3rd Qu.: 31.00
## Max. :216.00
# Data cleaning (if necessary)
# Ensure Date is in proper format and handle missing values
chickenpox_data$Date <- as.Date(chickenpox_data$Date, format = "%d/%m/%Y")
chickenpox_data[is.na(chickenpox_data)] <- 0 # Replace missing values with 0 for cases
To understand the overall pattern of chickenpox cases across Hungary, we will sum the weekly cases from all counties and plot the national trend. This will give us a sense of the long-term trends in chickenpox cases across the entire country.
# Calculate national trend by summing cases across counties
national_trend <- chickenpox_data %>%
select(-Date) %>%
rowSums() %>%
data.frame(Date = chickenpox_data$Date, Total_cases = .)
# Plot the national trend
ggplot(national_trend, aes(x = Date, y = Total_cases)) +
geom_line(color = "blue") +
labs(title = "National Trend of Chickenpox Cases in Hungary (2005-2015)",
x = "Year", y = "Total Cases") +
theme_minimal()
By examining the total cases per month across all years, we can observe if there are any seasonal fluctuations in chickenpox cases. This type of analysis helps in identifying peak periods, which could inform vaccination campaigns or other public health initiatives.
# Extract month from the Date column
national_trend$Month <- month(national_trend$Date)
# Summarize cases by month
monthly_trend <- national_trend %>%
group_by(Month) %>%
summarise(Total_cases = sum(Total_cases))
# Plot monthly seasonality
ggplot(monthly_trend, aes(x = Month, y = Total_cases)) +
geom_bar(stat = "identity", fill = "orange") +
labs(title = "Seasonality of Chickenpox Cases (2005-2015)",
x = "Month", y = "Total Cases") +
scale_x_continuous(breaks = 1:12, labels = month.name) +
theme_minimal()
To compare chickenpox trends interactively, we provide a large and advanced plot where users can click on the legend to isolate specific counties or hover over the data points for details.
# Select all counties to compare
all_counties <- colnames(chickenpox_data)[2:ncol(chickenpox_data)] # Extract all county names
# Reshape data for comparison
county_comparison_all <- chickenpox_data %>%
select(Date, all_of(all_counties)) %>%
pivot_longer(cols = -Date, names_to = "County", values_to = "Cases")
# Create an interactive plot using plotly
plot <- ggplot(county_comparison_all, aes(x = as.Date(Date, "%d/%m/%Y"), y = Cases, color = County)) +
geom_line() +
labs(title = "Interactive Comparison of Chickenpox Cases by County (2005-2015)",
x = "Date", y = "Cases") +
theme_minimal() +
theme(legend.position = "right", legend.title = element_text(size = 10), legend.text = element_text(size = 8))
# Convert ggplot to plotly
interactive_plot <- ggplotly(plot) %>%
layout(
title = list(text = "<b>Interactive Comparison of Chickenpox Cases by County (2005-2015)</b>"),
legend = list(title = list(text = "<b>Select Counties</b>")),
width = 950, # Set the width of the plot
height = 650 # Set the height of the plot
)
# Display the interactive plot
interactive_plot
To understand how chickenpox cases are temporally correlated, we compute and visualize the autocorrelation function (ACF) for selected counties. This will help us detect patterns, such as seasonality or persistence in outbreaks.
# Select a county (e.g., BUDAPEST) for analysis
selected_county <- "BUDAPEST"
# Extract data for the selected county
county_ts <- ts(chickenpox_data[[selected_county]], frequency = 52) # Weekly data (52 weeks/year)
# Compute and plot ACF
acf_plot <- ggAcf(county_ts, lag.max = 104) + # Analyze up to 2 years (104 weeks)
labs(title = paste("Temporal Autocorrelation of Chickenpox Cases in", selected_county),
x = "Lag (weeks)", y = "ACF") +
theme_minimal()
acf_plot
Spatial autocorrelation is assessed using Moran’s I to determine whether chickenpox cases in one county are similar to those in neighboring counties.
# Extract all unique county names
all_counties <- unique(c(county_edges$name_1, county_edges$name_2))
# Initialize a square adjacency matrix
adjacency_matrix <- matrix(0, nrow = length(all_counties), ncol = length(all_counties),
dimnames = list(all_counties, all_counties))
# Fill the adjacency matrix with 1 where there is an edge
for (i in 1:nrow(county_edges)) {
row <- county_edges$name_1[i]
col <- county_edges$name_2[i]
adjacency_matrix[row, col] <- 1
adjacency_matrix[col, row] <- 1 # Ensure symmetry
}
# Convert to a spatial weights list
weights <- mat2listw(adjacency_matrix, style = "W")
# Compute county-level totals for chickenpox cases
county_totals <- chickenpox_data %>%
select(-Date) %>%
colSums()
# Run Moran's I test
moran_test <- moran.test(county_totals, weights)
# Display Moran's I results
cat("Moran's I: ", moran_test$estimate[1], "\n")
## Moran's I: 0.2191511
cat("P-value: ", moran_test$p.value, "\n")
## P-value: 0.007661405
Description: This script generates a heatmap to visualize the weekly distribution of chickenpox cases across the 20 counties in Hungary over a 10-year period (2005-2015). The data is first reshaped to include weekly totals per county. Each tile in the heatmap represents the total cases for a specific week and county, with color intensity indicating the number of cases. The visualization helps identify temporal and spatial patterns, such as peaks and hotspots of chickenpox outbreaks.
# Reshape data to get weekly cases for each county
heatmap_data <- chickenpox_data %>%
select(Date, all_of(all_counties)) %>%
pivot_longer(cols = -Date, names_to = "County", values_to = "Cases") %>%
mutate(Week = week(as.Date(Date, "%d/%m/%Y")),
Year = year(as.Date(Date, "%d/%m/%Y")))
# Calculate the sum of cases per week for each county
heatmap_data <- heatmap_data %>%
group_by(Year, Week, County) %>%
summarise(Total_cases = sum(Cases)) %>%
ungroup()
## `summarise()` has grouped output by 'Year', 'Week'. You can override using the
## `.groups` argument.
# Create a heatmap plot
heatmap_plot <- ggplot(heatmap_data, aes(x = Week, y = County, fill = Total_cases)) +
geom_tile() +
scale_fill_gradient(low = "white", high = "brown") +
labs(title = "Heatmap of Chickenpox Cases by County and Week (2005-2015)",
x = "Week of Year", y = "County", fill = "Total Cases") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) # Rotate x-axis labels for clarity
# Display the heatmap
heatmap_plot
Description: This script creates a detailed heatmap visualizing the monthly distribution of chickenpox cases across Hungary’s counties over multiple years. The data is aggregated to show total cases for each month and year, organized by county. Each subplot represents a county, with colors indicating case intensity using the “YlOrRd” palette for better distinction of peaks. This visualization highlights intra-annual seasonality and temporal patterns of chickenpox outbreaks, offering insights into seasonal trends within each county.
# Calculate total cases by year and month for each county
seasonality_data <- chickenpox_data %>%
select(Date, all_of(all_counties)) %>%
pivot_longer(cols = -Date, names_to = "County", values_to = "Cases") %>%
mutate(Year = year(as.Date(Date, "%d/%m/%Y")),
Month = month(as.Date(Date, "%d/%m/%Y"))) %>%
group_by(Year, Month, County) %>%
summarise(Total_cases = sum(Cases)) %>%
ungroup()
## `summarise()` has grouped output by 'Year', 'Month'. You can override using the
## `.groups` argument.
ggplot(seasonality_data, aes(x = Month, y = Year, fill = Total_cases)) +
geom_tile() +
scale_fill_gradientn(colors = brewer.pal(9, "YlOrRd")) + # Using the YlOrRd palette for the color scale
facet_wrap(~ County, scales = "free_y") +
labs(title = "Heatmap of Monthly Chickenpox Cases by County",
x = "Month", y = "Year", fill = "Total Cases") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1),
strip.text = element_text(size = 10)) # Adjust size of facet labels for better readability
This study examined the spatio-temporal patterns of chickenpox cases in Hungary over a 10-year period. The analysis revealed clear trends in the spread and intensity of outbreaks, both nationally and across individual counties. Seasonal patterns were evident, with cases peaking during specific months, indicating strong intra-annual seasonality. The comparison across counties showed differences in case numbers, suggesting that some regions experienced higher or more frequent outbreaks than others.
Temporal and spatial autocorrelation analyses further highlighted how chickenpox cases were influenced by both time and location, indicating clusters of outbreaks that followed predictable patterns. These insights can support public health strategies, such as targeted vaccination campaigns, to reduce the spread of chickenpox in high-risk areas and during peak seasons. Overall, the findings emphasize the importance of combining spatial and temporal data for understanding disease dynamics and improving healthcare planning..